临床上,病变/组织的准确注释可以显着促进疾病诊断。例如,对眼底图像的视盘/杯/杯(OD/OC)的分割将有助于诊断青光眼诊断,皮肤镜图像上皮肤病变的分割有助于黑色素瘤诊断等。随着深度学习技术的发展,广泛的方法证明了病变/组织分割还可以促进自动疾病诊断模型。但是,现有方法是有限的,因为它们只能捕获图像中的静态区域相关性。受视觉变压器的全球和动态性质的启发,在本文中,我们提出了分割辅助诊断变压器(SeaTrans),以将分割知识转移到疾病诊断网络中。具体而言,我们首先提出了一种不对称的多尺度相互作用策略,以将每个单个低级诊断功能与多尺度分割特征相关联。然后,采用了一种称为海块的有效策略,以通过相关的分割特征使诊断特征生命。为了模拟分割诊断的相互作用,海块首先根据分段信息通过编码器嵌入诊断功能,然后通过解码器将嵌入的嵌入回到诊断功能空间中。实验结果表明,关于几种疾病诊断任务的海洋侵蚀超过了广泛的最新(SOTA)分割辅助诊断方法。
translated by 谷歌翻译
眼底图像的视盘(OD)和视杯(OC)的分割是青光眼诊断的重要基本任务。在临床实践中,通常有必要从多位专家那里收集意见,以获得最终的OD/OC注释。这种临床常规有助于减轻单个偏见。但是,当数据乘以注释时,标准深度学习模型将不适用。在本文中,我们提出了一个新型的神经网络框架,以从多评价者注释中学习OD/OC分割。分割结果通过迭代优化多评价专家的估计和校准OD/OC分割来自校准。这样,提出的方法可以实现这两个任务的相互改进,并最终获得精制的分割结果。具体而言,我们提出分化模型(DIVM)和收敛模型(CONM)分别处理这两个任务。 CONM基于DIVM提供的多评价专家图的原始图像。 DIVM从CONM提供的分割掩码中生成多评价者专家图。实验结果表明,通过经常运行CONM和DIVM,可以对结果进行自校准,从而超过一系列最新的(SOTA)多评价者分割方法。
translated by 谷歌翻译
预训练对于深度学习模型的表现至关重要,尤其是在有限的培训数据的医学图像分析任务中。但是,现有的预训练方法是不灵活的,因为其他网络体系结构不能重复使用一个模型的预训练权重。在本文中,我们提出了一个体系结构 - 无限量化器,它可以在一次预先训练后才良好地初始化任何给定的网络体系结构。所提出的初始器是一个超网络,将下游体系结构作为输入图,并输出相应体系结构的初始化参数。我们通过多种医学成像方式,尤其是在数据限制的领域中,通过广泛的实验结果来展示高档化器的有效性和效率。此外,我们证明,可以将所提出的算法重复使用,作为同一模态的任何下游体系结构和任务(分类和分割)的有利的插件初始化器。
translated by 谷歌翻译
随着深度学习技术的发展,从底眼图像中提出了越来越多的方法对视盘和杯子(OD/OC)进行分割。在临床上,多位临床专家通常会注释OD/OC细分以减轻个人偏见。但是,很难在多个标签上训练自动化的深度学习模型。解决该问题的一种普遍做法是多数投票,例如,采用多个标签的平均值。但是,这种策略忽略了医学专家的不同专家。通过观察到的观察,即在临床上通常将OD/OC分割用于青光眼诊断,在本文中,我们提出了一种新的策略,以通过青光眼诊断性能融合多评分者OD/OC分割标签。具体而言,我们通过细心的青光眼诊断网络评估每个评估者的专业性。对于每个评估者,其对诊断的贡献将被反映为专家图。为了确保对不同青光眼诊断模型的专家图是一般性的,我们进一步提出了专家生成器(EXPG),以消除优化过程中的高频组件。基于获得的专家图,多评价者标签可以融合为单个地面真相,我们将其称为诊断第一基地真相(diagfirstgt)。实验结果表明,通过将diagfirstgt用作地面真相,OD/OC分割网络将预测具有优质诊断性能的面膜。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
Existing measures and representations for trajectories have two longstanding fundamental shortcomings, i.e., they are computationally expensive and they can not guarantee the `uniqueness' property of a distance function: dist(X,Y) = 0 if and only if X=Y, where $X$ and $Y$ are two trajectories. This paper proposes a simple yet powerful way to represent trajectories and measure the similarity between two trajectories using a distributional kernel to address these shortcomings. It is a principled approach based on kernel mean embedding which has a strong theoretical underpinning. It has three distinctive features in comparison with existing approaches. (1) A distributional kernel is used for the very first time for trajectory representation and similarity measurement. (2) It does not rely on point-to-point distances which are used in most existing distances for trajectories. (3) It requires no learning, unlike existing learning and deep learning approaches. We show the generality of this new approach in three applications: (a) trajectory anomaly detection, (b) anomalous sub-trajectory detection, and (c) trajectory pattern mining. We identify that the distributional kernel has (i) a unique data-dependent property and the above uniqueness property which are the key factors that lead to its superior task-specific performance; and (ii) runtime orders of magnitude faster than existing distance measures.
translated by 谷歌翻译
Natural Language Processing (NLP) has been revolutionized by the use of Pre-trained Language Models (PLMs) such as BERT. Despite setting new records in nearly every NLP task, PLMs still face a number of challenges including poor interpretability, weak reasoning capability, and the need for a lot of expensive annotated data when applied to downstream tasks. By integrating external knowledge into PLMs, \textit{\underline{K}nowledge-\underline{E}nhanced \underline{P}re-trained \underline{L}anguage \underline{M}odels} (KEPLMs) have the potential to overcome the above-mentioned limitations. In this paper, we examine KEPLMs systematically through a series of studies. Specifically, we outline the common types and different formats of knowledge to be integrated into KEPLMs, detail the existing methods for building and evaluating KEPLMS, present the applications of KEPLMs in downstream tasks, and discuss the future research directions. Researchers will benefit from this survey by gaining a quick and comprehensive overview of the latest developments in this field.
translated by 谷歌翻译
Autonomous robotic surgery has advanced significantly based on analysis of visual and temporal cues in surgical workflow, but relational cues from domain knowledge remain under investigation. Complex relations in surgical annotations can be divided into intra- and inter-relations, both valuable to autonomous systems to comprehend surgical workflows. Intra- and inter-relations describe the relevance of various categories within a particular annotation type and the relevance of different annotation types, respectively. This paper aims to systematically investigate the importance of relational cues in surgery. First, we contribute the RLLS12M dataset, a large-scale collection of robotic left lateral sectionectomy (RLLS), by curating 50 videos of 50 patients operated by 5 surgeons and annotating a hierarchical workflow, which consists of 3 inter- and 6 intra-relations, 6 steps, 15 tasks, and 38 activities represented as the triplet of 11 instruments, 8 actions, and 16 objects, totaling 2,113,510 video frames and 12,681,060 annotation entities. Correspondingly, we propose a multi-relation purification hybrid network (MURPHY), which aptly incorporates novel relation modules to augment the feature representation by purifying relational features using the intra- and inter-relations embodied in annotations. The intra-relation module leverages a R-GCN to implant visual features in different graph relations, which are aggregated using a targeted relation purification with affinity information measuring label consistency and feature similarity. The inter-relation module is motivated by attention mechanisms to regularize the influence of relational features based on the hierarchy of annotation types from the domain knowledge. Extensive experimental results on the curated RLLS dataset confirm the effectiveness of our approach, demonstrating that relations matter in surgical workflow analysis.
translated by 谷歌翻译
Deep learning-based methods have achieved significant performance for image defogging. However, existing methods are mainly developed for land scenes and perform poorly when dealing with overwater foggy images, since overwater scenes typically contain large expanses of sky and water. In this work, we propose a Prior map Guided CycleGAN (PG-CycleGAN) for defogging of images with overwater scenes. To promote the recovery of the objects on water in the image, two loss functions are exploited for the network where a prior map is designed to invert the dark channel and the min-max normalization is used to suppress the sky and emphasize objects. However, due to the unpaired training set, the network may learn an under-constrained domain mapping from foggy to fog-free image, leading to artifacts and loss of details. Thus, we propose an intuitive Upscaling Inception Module (UIM) and a Long-range Residual Coarse-to-fine framework (LRC) to mitigate this issue. Extensive experiments on qualitative and quantitative comparisons demonstrate that the proposed method outperforms the state-of-the-art supervised, semi-supervised, and unsupervised defogging approaches.
translated by 谷歌翻译
Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model's robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.
translated by 谷歌翻译